dominant eigenvector
Solving Interpretable Kernel Dimensionality Reduction
Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition.
Coordinate-wise Power Method
Qi Lei, Kai Zhong, Inderjit S. Dhillon
In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- (2 more...)
Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective
Zhang, Kaicheng, Deidda, Piero, Higham, Desmond, Tudisco, Francesco
Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. While these metrics are related to oversmoothing, we argue they have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks and under somewhat strict conditions on the norm of network weights and feature representations. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide theoretical support for this approach, demonstrating that the numerical rank of feature representations converges to one for a broad family of nonlinear activation functions under the assumption of nonnegative trained weights. To the best of our knowledge, this is the first result that proves the occurrence of oversmoothing without assumptions on the boundedness of the weight matrices. Along with the theoretical findings, we provide extensive numerical evaluation across diverse graph architectures. Our results show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- (3 more...)
Difference vs. Quotient: A Novel Algorithm for Dominant Eigenvalue Problem
The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous machine learning applications. Traditional approaches predominantly rely on the constrained Quotient formulation, which underpins most existing methods. However, these methods often suffer from challenges related to computational efficiency and dependence on spectral prior knowledge. This paper introduces a novel perspective by reformulating the eigenvalue problem using an unconstrained Difference formulation. This new approach sheds light on classical methods, revealing that the power method can be interpreted as a specific instance of Difference of Convex Algorithms. Building on this insight, we develop a generalized family of Difference-Type methods, which encompasses the power method as a special case. Within this family, we propose the Split-Merge algorithm, which achieves maximal acceleration without spectral prior knowledge and operates solely through matrix-vector products, making it both efficient and easy to implement. Extensive empirical evaluations on both synthetic and real-world datasets highlight that the Split-Merge algorithm achieves over a $\boldsymbol{10\times}$ speedup compared to the basic power method, offering significant advancements in efficiency and practicality for large-scale machine learning problems.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (3 more...)
Solving Interpretable Kernel Dimensionality Reduction
Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition.
Coordinate-wise Power Method Qi Lei
In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds. Our proposed algorithm, which we call coordinate-wise power method, is able to select and update the most important k coordinates in O(kn) time at each iteration, where n is the dimension of the matrix and k apple n is the size of the active set. Inspired by the "greedy" nature of our method, we further propose a greedy coordinate descent algorithm applied on a non-convex objective function specialized for symmetric matrices. We provide convergence analyses for both methods. Experimental results on both synthetic and real data show that our methods achieve up to 23 times speedup over the basic power method. Meanwhile, due to their coordinate-wise nature, our methods are very suitable for the important case when data cannot fit into memory. Finally, we introduce how the coordinatewise mechanism could be applied to other iterative methods that are used in machine learning.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering
Lebeau, Hugo, Chatelain, Florent, Couillet, Romain
The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.06)
- North America > United States > Ohio > Cuyahoga County > Shaker Heights (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
Massively Scalable Inverse Reinforcement Learning in Google Maps
Barnes, Matt, Abueg, Matthew, Lange, Oliver F., Deeds, Matt, Trader, Jason, Molitor, Denali, Wulfmeier, Markus, O'Banion, Shawn
Optimizing for humans' latent preferences remains a grand challenge in route recommendation. Prior research has provided increasingly general techniques based on inverse reinforcement learning (IRL), yet no approach has been successfully scaled to world-sized routing problems with hundreds of millions of states and demonstration trajectories. In this paper, we provide methods for scaling IRL using graph compression, spatial parallelization, and problem initialization based on dominant eigenvectors. We revisit classic algorithms and study them in a large-scale setting, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. We leverage this insight in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our contributions culminate in a policy that achieves a 16-24% improvement in global route quality, and to the best of our knowledge, represents the largest instance of IRL in a real-world setting to date. Benchmark results show critical benefits to more sustainable modes of transportation, where factors beyond journey time play a substantial role. We conclude by conducting an ablation study of key components, presenting negative results from alternative eigenvalue solvers, and identifying opportunities to further improve scalability via IRL-specific batching strategies.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
- (17 more...)
- Leisure & Entertainment > Games > Computer Games (0.67)
- Transportation > Ground > Road (0.46)
- Information Technology > Services (0.41)
Blind Extraction of Equitable Partitions from Graph Signals
Scholkemper, Michael, Schaub, Michael
Finding equitable partitions is closely related to the extraction of graph symmetries and of interest in a variety of applications context such as node role detection, cluster synchronization, consensus dynamics, and network control problems. In this work we study a blind identification problem in which we aim to recover an equitable partition of a network without the knowledge of the network's edges but based solely on the observations of the outputs of an unknown graph filter. Specifically, we consider two settings. First, we consider a scenario in which we can control the input to the graph filter and present a method to extract the partition inspired by the well known Weisfeiler-Lehman (color refinement) algorithm. Second, we generalize this idea to a setting where only observe the outputs to random, low-rank excitations of the graph filter, and present a simple spectral algorithm to extract the relevant equitable partitions. Finally, we establish theoretical bounds on the error that this spectral detection scheme incurs and perform numerical experiments that illustrate our theoretical results and compare both algorithms.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Germany > North Rhine-Westphalia (0.04)
FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis
Gang, Arpita, Bajwa, Waheed U.
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)